AITopics | sparse linear regression

Collaborating Authors

sparse linear regression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Neural Information Processing SystemsMar-22-2026, 16:04:43 GMT

Despite the remarkable success of transformer-based models in various real-world tasks, their underlying mechanisms remain poorly understood. Recent studies have suggested that transformers can implement gradient descent as an in-context learner for linear regression problems and have developed various theoretical analyses accordingly. However, these works mostly focus on the expressive power of transformers by designing specific parameter constructions, lacking a comprehensive understanding of their inherent working mechanisms post-training. In this study, we consider a sparse linear regression problem and investigate how a trained multi-head transformer performs in-context learning. We experimentally discover that the utilization of multi-heads exhibits different patterns across layers: multiple heads are utilized and essential in the first layer, while usually only a single head is sufficient for subsequent layers. We provide a theoretical explanation for this observation: the first layer preprocesses the context data, and the following layers execute simple optimization steps based on the preprocessed context. Moreover, we demonstrate that such a preprocess-then-optimize algorithm can significantly outperform naive gradient descent and ridge regression algorithms. Further experimental results support our explanations. Our findings offer insights into the benefits of multi-head attention and contribute to understanding the more intricate mechanisms hidden within trained transformers.

artificial intelligence, machine learning, transformer utilize multi-head attention, (11 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Efficient Sublinear-Regret Algorithms for Online Sparse Linear Regression with Limited Observation

Neural Information Processing SystemsMar-17-2026, 15:08:39 GMT

Online sparse linear regression is the task of applying linear regression analysis to examples arriving sequentially subject to a resource constraint that a limited number of features of examples can be observed. Despite its importance in many practical applications, it has been recently shown that there is no polynomial-time sublinear-regret algorithm unless NP$\subseteq$BPP, and only an exponential-time sublinear-regret algorithm has been found. In this paper, we introduce mild assumptions to solve the problem.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.95)

Add feedback

Sparse PCA from Sparse Linear Regression

Guy Bresler, Sung Min Park, Madalina Persu

Neural Information Processing SystemsMar-15-2026, 16:54:26 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, statistics, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)

Add feedback

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

Tomoya Murata, Taiji Suzuki

Neural Information Processing SystemsFeb-14-2026, 12:07:33 GMT

In real-world sequential prediction scenarios, the features (or attributes) of examples are typically high-dimensional and construction of the all features for each example may be expensive or impossible.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Add feedback

d5b3d8dadd770c460b1cde910a711987-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 08:45:35 GMT

Estimating information from structured data is acentral theme in statistics that by now has found applications in a wide array of disciplines.

artificial intelligence, estimator, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.16)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

9a8d52eb05eb7b13f54b3d9eada667b7-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 23:46:24 GMT

matrix, preconditioner, probability, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

LowerBoundsonRandomlyPreconditionedLasso viaRobustSparseDesigns

Neural Information Processing SystemsFeb-10-2026, 23:46:20 GMT

However, this lower bound only holds against deterministic preconditioners, and in many contexts randomization is crucial to the success of preconditioners. We prove a stronger lower bound that rules out randomized preconditioners.

artificial intelligence, machine learning, preconditioner, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

7a006957be65e608e863301eb98e1808-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 01:24:30 GMT

In Appendix A, we review some statistical results for sparse linear regression. We review some classical results in sparse linear regression. Let the design matrix beX = (x1,...,xn)> Rn d. Second, we derive a regret lower bound of alternative banditeθ.

artificial intelligence, machine learning, supp, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Feature Adaptation for Sparse Linear Regression

Neural Information Processing SystemsDec-25-2025, 14:45:39 GMT

Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,\Sigma)$, and we seek an estimator with small excess risk. If the true signal is $t$-sparse, information-theoretically, it is possible to achieve strong recovery guarantees with only $O(t\log n)$ samples. However, computationally efficient algorithms have sample complexity linear in (some variant of) the *condition number* of $\Sigma$. Classical algorithms such as the Lasso can require significantly more samples than necessary even if there is only a single sparse approximate dependency among the covariates.We provide a polynomial-time algorithm that, given $\Sigma$, automatically adapts the Lasso to tolerate a small number of approximate dependencies. In particular, we achieve near-optimal sample complexity for constant sparsity and if $\Sigma$ has few ``outlier'' eigenvalues.Our algorithm fits into a broader framework of *feature adaptation* for sparse linear regression with ill-conditioned covariates. With this framework, we additionally provide the first polynomial-factor improvement over brute-force search for constant sparsity $t$ and arbitrary covariance $\Sigma$.

feature adaptation, name change, sparse linear regression, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.66)

Add feedback

Filters

Collaborating Authors

sparse linear regression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Efficient Sublinear-Regret Algorithms for Online Sparse Linear Regression with Limited Observation

Sparse PCA from Sparse Linear Regression

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

5f999632c48f87cffb214e575581e4a9-Paper-Conference.pdf

d5b3d8dadd770c460b1cde910a711987-Paper.pdf

9a8d52eb05eb7b13f54b3d9eada667b7-Supplemental-Conference.pdf

LowerBoundsonRandomlyPreconditionedLasso viaRobustSparseDesigns

7a006957be65e608e863301eb98e1808-Supplemental.pdf

Feature Adaptation for Sparse Linear Regression